class: center, middle, inverse, title-slide .title[ # Three common mistakes in statistics and how to avoid them ] .author[ ### Elizabeth Pankratz ] .institute[ ### Department of Psychology
The University of Edinburgh ] --- <!-- ## But first: --> <!-- Go to `menti.com` and enter code `3459 5977`. --> <!-- Or scan QR code: --> <!--  --> .pull-left[ ## The mistake ] .pull-right[ ## How you'll avoid it ] -- .pull-left[
**A common R programming mistake:** Letting R treat all variables that consist of numbers as numeric. ] .pull-right[ ] -- .pull-left[
**An advanced statistical mistake:** Modelling categorical, ordinal data as if it were numeric. ] .pull-right[ ] -- .pull-left[
**A foundational statistical mistake:** Interpreting a significant *p*-value as evidence that an effect exists. ] .pull-right[ ] --- class: inverse, middle, center
# The data we'll use --- ## The SMARVUS dataset (Terry et al., 2023) .center[SMARVUS = **S**tatistics and **M**athematics **A**nxieties and **R**elated **V**ariables in **U**niversity **S**tudents] -- .pull-left[ A survey of *n* = 18,841 students (mostly Psychology UGs) from 35 countries. Students rated their anxiety from 1 (no anxiety) to 5 (a great deal of anxiety) in scenarios like: - Studying for a statistics test. - Interpreting the meaning of a table in a journal article. - **Going to ask my statistics teacher for individual help with material I am having difficulty understanding. → ** ] -- .pull-right[ <img src="data:image/png;base64,#demo_files/figure-html/bar-aggregated-1.png" width="504" style="display: block; margin: auto;" /> ] --- ## Why Likert scale ratings are not continuous numeric .center[  ] --- count:false ## Why Likert scale ratings are not continuous numeric .center[  ] --- count:false ## Why Likert scale ratings are not continuous numeric .center[  ] --- count:false ## Why Likert scale ratings are not continuous numeric .center[  ] --- ## And yet... -- .pull-left[  Reeder et al. (2017) in **Journal of Memory and Language.** ] .pull-right[  Elazar et al. (2022) in **Cognitive Science.** <br>  Harrigan et al. (2022) in **Language.** ] --- ## R will keep numeric-looking data numeric -- .pull-left[ ``` r head(anx, 3) ``` ``` ## # A tibble: 3 × 3 ## unique_id gender rating ## <chr> <fct> <dbl> ## 1 01057178 Male/Man 3 ## 2 0300b5f2 Female/Woman 1 ## 3 03f6503b Female/Woman 3 ``` ] -- .pull-right[ If we allow R's default behaviour, then we can do naughty things with categorical variables: ``` r mean(anx$rating) ``` ``` ## [1] 2.868054 ``` ] -- But if we **store these variables as factors,** the naughty things become impossible (yay!): ``` r anx <- anx |> mutate(rating = factor(rating)) mean(anx$rating) ``` ``` ## [1] NA ``` --- .pull-left[ ## The mistake ] .pull-right[ ## How you'll avoid it ] .pull-left[
**A common R programming mistake:** Letting R treat all variables that consist of numbers as numeric. ] .pull-right[ ] .pull-left[
**An advanced statistical mistake:** Modelling categorical, ordinal data as if it were numeric. ] .pull-right[ ] .pull-left[
**A foundational statistical mistake:** Interpreting a significant *p*-value as evidence that an effect exists. ] .pull-right[ ] --- count:false .pull-left[ ## The mistake ] .pull-right[ ## How you'll avoid it ] .pull-left[
**A common R programming mistake:** Letting R treat all variables that consist of numbers as numeric. ] .pull-right[ When you know a variable is categorical, tell R that using `factor()`. ] .pull-left[
**An advanced statistical mistake:** Modelling categorical, ordinal data as if it were numeric. ] .pull-right[ ] .pull-left[
**A foundational statistical mistake:** Interpreting a significant *p*-value as evidence that an effect exists. ] .pull-right[ ] --- class: inverse, middle, center
# Modelling an ordinal variable ### The .mono-white[polr()] express --- ## Model ordinal data with `polr()` polr = **P**roportional **O**dds **L**ogistic **R**egression -- ``` r library(MASS) # MASS contains the polr() function anx_fit1 <- polr( rating ~ 1, # intercept-only model, to start data = anx, Hess = TRUE, # required if we want to use summary() method = 'probit' # more on this in a moment ) ``` --- ## Model ordinal data with `polr()` ``` r summary(anx_fit1) ``` ``` ## Call: ## polr(formula = rating ~ 1, data = anx, Hess = TRUE, method = "probit") ## ## No coefficients ## ## Intercepts: ## Value Std. Error t value ## 1|2 -0.8420 0.0157 -53.7268 ## 2|3 -0.1678 0.0138 -12.1462 ## 3|4 0.3833 0.0141 27.1512 ## 4|5 1.0339 0.0168 61.6193 ## ## Residual Deviance: 26596.28 ## AIC: 26604.28 ``` --- ## What do those `Intercepts` mean? -- <img src="data:image/png;base64,#demo_files/figure-html/plot-underlying-normal-1.png" width="864" style="display: block; margin: auto;" /> ??? - imagine that there's some underlying continuous normal distribution of anxiety, assumed standard normal [show normal distrib] - ppl with high anxiety are more likely to give high responses, ppl with low anxiety more likely to give low responses (could do emojis relating to anxiety:
,
) - so to estimate how different anxiety levels translate to different responses on the 1--5 scale, we draw thresholds on that distribution [add thresholds] - ppl with anxiety in this bin will respond with 1, in this bin with 2, etc. - and those thresholds, the cutpoints btwn ratings, are the intercepts. - [show intercept estimates, put thoes same numbers on the thresholds] - normal distribution assumption is from method = probit. other methods assume other underlying distributions, but the idea of thresholds is the same. --- count: false ## What do those `Intercepts` mean? <img src="data:image/png;base64,#demo_files/figure-html/plot-underlying-normal2-1.png" width="864" style="display: block; margin: auto;" /> --- count: false ## What do those `Intercepts` mean? <img src="data:image/png;base64,#demo_files/figure-html/plot-underlying-normal3-1.png" width="864" style="display: block; margin: auto;" /> --- class: inverse, middle, center
# Stats anxiety and gender --- #### How does a student's gender affect how they respond to "Going to ask my statistics teacher for individual help with material I am having difficulty understanding"? -- .pull-left[ <img src="data:image/png;base64,#demo_files/figure-html/plot-gender-bars-1.png" width="504" style="display: block; margin: auto;" /> ] -- .pull-right[ <br> <br>
**First:** Think to yourself about the questions.
**Then:** Ask your neighbour what they think. What's their reasoning? What's yours?
**Afterward:** we'll look at the model's estimates together and discuss. ] --- .center[ <img src="data:image/png;base64,#demo_files/figure-html/fem-normal-1.png" width="936" style="display: block; margin: auto;" /> ] --- count: false <img src="data:image/png;base64,#demo_files/figure-html/fem-mal-normals-1.png" width="936" style="display: block; margin: auto;" /> --- count: false <img src="data:image/png;base64,#demo_files/figure-html/all-gender-normals-1.png" width="936" style="display: block; margin: auto;" /> --- ``` r anx_fit2 <- polr( rating ~ gender, data = anx, method = 'probit', Hess = TRUE ) summary(anx_fit2) ``` ``` ## Coefficients: ## Value Std. Error t value ## genderMale/Man -0.3280 0.03015 -10.880 ## genderAnother Gender 0.4846 0.11992 4.041 ## ## Intercepts: ## Value Std. Error t value ## 1|2 -0.9045 0.0169 -53.5402 ## 2|3 -0.2246 0.0150 -14.9847 ## 3|4 0.3318 0.0151 21.9158 ## 4|5 0.9889 0.0176 56.2958 ``` --- .pull-left[ ## The mistake ] .pull-right[ ## How you'll avoid it ] <!-- --> .pull-left[
**A common R programming mistake:** Letting R treat all variables that consist of numbers as numeric. ] .pull-right[ When you know a variable is categorical, tell R that using `factor()`. ] <!-- --> .pull-left[
**An advanced statistical mistake:** Modelling categorical, ordinal data as if it were numeric. ] .pull-right[ ] <!-- --> .pull-left[
**A foundational statistical mistake:** Interpreting a significant *p*-value as evidence that an effect exists. ] .pull-right[ ] --- count: false .pull-left[ ## The mistake ] .pull-right[ ## How you'll avoid it ] <!-- --> .pull-left[
**A common R programming mistake:** Letting R treat all variables that consist of numbers as numeric. ] .pull-right[ When you know a variable is categorical, tell R that using `factor()`. ] <!-- --> .pull-left[
**An advanced statistical mistake:** Modelling categorical, ordinal data as if it were numeric. ] .pull-right[ Apply and interpret ordinal regression models (e.g., `polr()` from `MASS`). ] <!-- --> .pull-left[
**A foundational statistical mistake:** Interpreting a significant *p*-value as evidence that an effect exists. ] .pull-right[ ] --- class: inverse, middle, center
# Interpreting *p*-values --- ## Are the effects of `gender` significant? ``` ## Coefficients: ## Value Std. Error t value ## genderMale/Man -0.3280 0.03015 -10.880 ## genderAnother Gender 0.4846 0.11992 4.041 ``` No *p*-values in the model summary. -- But it's common practice to compare these *t*-values to a standard normal distribution. -- ``` r pnorm(abs(-10.880), lower.tail = FALSE) * 2 ``` ``` ## [1] 1.43563e-27 ``` ``` r pnorm(abs( 4.041), lower.tail = FALSE) * 2 ``` ``` ## [1] 5.322376e-05 ``` ??? Since both *p*-values are below 0.05: - we CAN reject the null hypothesis that gender has no effect on ratings. - **we CANNOT conclude that there really is an effect of gender.** --- ### Why don't significant *p*-values mean an effect exists? Because we can also get significant *p*-values when there really is *no* effect. -- .pull-left[ No difference in the true population: <img src="data:image/png;base64,#demo_files/figure-html/true-skew-probdist-1.png" width="504" style="display: block; margin: auto;" /> ] -- .pull-right[ A possible random sample (*n* = 50 per group): <img src="data:image/png;base64,#demo_files/figure-html/simdat-1.png" width="504" style="display: block; margin: auto;" /> ] --- ### Why don't significant *p*-values mean an effect exists? ``` r sim_fit <- polr(rating ~ group, data = simdat, method = 'probit', Hess = TRUE) summary(sim_fit) ``` ``` ## Coefficients: ## Value Std. Error t value ## groupGroup B -0.4479 0.2229 -2.009 ``` <br> -- ``` r pnorm(abs(-2.009), lower.tail = FALSE) * 2 ``` ``` ## [1] 0.04453713 ``` So *p* is below 0.05, but in the true population, Group A and Group B were identical! --- .pull-left[ ## The mistake ] .pull-right[ ## How you'll avoid it ] <!-- --> .pull-left[
**A common R programming mistake:** Letting R treat all variables that consist of numbers as numeric. ] .pull-right[ When you know a variable is categorical, tell R that using `factor()`. ] <!-- --> .pull-left[
**An advanced statistical mistake:** Modelling categorical, ordinal data as if it were numeric. ] .pull-right[ Apply and interpret ordinal regression models (e.g., `polr()` from `MASS`). ] <!-- --> .pull-left[
**A foundational statistical mistake:** Interpreting a significant *p*-value as evidence that an effect exists. ] .pull-right[ ] --- count:false .pull-left[ ## The mistake ] .pull-right[ ## How you'll avoid it ] <!-- --> .pull-left[
**A common R programming mistake:** Letting R treat all variables that consist of numbers as numeric. ] .pull-right[ When you know a variable is categorical, tell R that using `factor()`. ] <!-- --> .pull-left[
**An advanced statistical mistake:** Modelling categorical, ordinal data as if it were numeric. ] .pull-right[ Apply and interpret ordinal regression models (e.g., `polr()` from `MASS`). ] <!-- --> .pull-left[
**A foundational statistical mistake:** Interpreting a significant *p*-value as evidence that an effect exists. ] .pull-right[ Understand that significant *p*-values can arise even if no effect exists. ] -- <br> .center[**Thank you!
Time for questions!**] --- count: false ## References Elazar, A., Alhama, R. G., Bogaerts, L., Siegelman, N., Baus, C., & Frost, R. (2022). When the "tabula" is anything but "rasa": What determines performance in the auditory statistical learning task? *Cognitive Science*, 46(2), e13102. Harrigan, K., Hogoboom, A., & Cochrane, L. (2022). Furthering student engagement: Lab sections in introductory linguistics. *Language*, 98(4), e199–e223. Reeder, P. A., Newport, E. L., & Aslin, R. N. (2017). Distributional learning of subcategories in an artificial grammar: Category generalization and subcategory restrictions. *Journal of Memory and Language*, 97, 17–29. Terry, J., Ross, R. M., Nagy, T., Salgado, M., Garrido-Vásquez, P., Sarfo, J. O., Cooper, S., Buttner, A. C., Lima, T. J. S., Öztürk, İ., Akay, N., Santos, F. H., Artemenko, C., Copping, L. T., Elsherif, M. M., Milovanović, I., Cribbie, R. A., Drushlyak, M. G., Swainston, K., … Field, A. P. (2023). Data from an International Multi-Centre Study of Statistics and Mathematics Anxieties and Related Variables in University Students (the SMARVUS Dataset). *Journal of Open Psychology Data*, 11(1), 8. --- count: false ## Helpful resources - Jamieson's (2004) paper _[Likert scales: How to (ab)use them](https://onlinelibrary.wiley.com/doi/10.1111/j.1365-2929.2004.02012.x)_ - UCLA Statistical Methods and Data Analytics's web page _[Ordinal Logistic Regression](https://stats.oarc.ucla.edu/r/dae/ordinal-logistic-regression/)_ - A. Solomon Kurz' (2021) blog post _[Notes on the Bayesian cumulative probit](https://stats.oarc.ucla.edu/r/dae/ordinal-logistic-regression/)_ - Gelman & Hill's (2007) book _[Data Analysis Using Regression and Multilevel/Hierarchical Models](https://www.cambridge.org/highereducation/books/data-analysis-using-regression-and-multilevel-hierarchical-models/32A29531C7FD730C3A68951A17C9D983)_